skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Yu, Jing"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 1, 2026
  2. Free, publicly-accessible full text available August 1, 2026
  3. Numerous sequence-based predictors of the amino acid (AA)-level solvent accessibility (SA) and secondary structure (SS) of proteins have been developed. We empirically investigated whether these two key characteristics of AA-level structure can be accurately predicted from putative structures generated by the popular AlphaFold2. We compared AlphaFold2's results against several representative SS and SA predictors on a large test dataset that covers five distinct taxonomic groups (animals, plants, fungi, bacteria, and archaea). We used a broad collection of metrics that evaluate predictions of the numeric and binary (buried vs. solvent exposed) SA and the 3-state SS at both AA- and SS-region levels. We found that AlphaFold2 generated very accurate results, with high average Q3 accuracy of 0.928 for the SS prediction and high Pearson Correlation Coefficient (PCC) of 0.815 between its putative and native SA values. AlphaFold2 significantly and consistently outperforms the considered predictors of SA and SS across the five taxonomic groups and both AA and region level evaluations. Moreover, we demonstrated that AlphaFold2 nearly perfectly reconstructs distributions of the sizes and numbers of the SS regions. We also showed that AlphaFold2 substantially improves over the SS and SA predictors when tested on a low sequence similarity test dataset, although its results and results of two other predictors suffer a modest drop in the quality of predicting SS regions. Altogether, our results suggest that AlphaFold2 makes very accurate predictions of SS and SA, which can be easily extracted from 200+ million pre-computed AF2's structure predictions in AlphaFoldDB. 
    more » « less
    Free, publicly-accessible full text available May 29, 2026
  4. Abstract Computational prediction of nucleic acid-binding residues in protein sequences is an active field of research, with over 80 methods that were released in the past 2 decades. We identify and discuss 87 sequence-based predictors that include dozens of recently published methods that are surveyed for the first time. We overview historical progress and examine multiple practical issues that include availability and impact of predictors, key features of their predictive models, and important aspects related to their training and assessment. We observe that the past decade has brought increased use of deep neural networks and protein language models, which contributed to substantial gains in the predictive performance. We also highlight advancements in vital and challenging issues that include cross-predictions between deoxyribonucleic acid (DNA)-binding and ribonucleic acid (RNA)-binding residues and targeting the two distinct sources of binding annotations, structure-based versus intrinsic disorder-based. The methods trained on the structure-annotated interactions tend to perform poorly on the disorder-annotated binding and vice versa, with only a few methods that target and perform well across both annotation types. The cross-predictions are a significant problem, with some predictors of DNA-binding or RNA-binding residues indiscriminately predicting interactions with both nucleic acid types. Moreover, we show that methods with web servers are cited substantially more than tools without implementation or with no longer working implementations, motivating the development and long-term maintenance of the web servers. We close by discussing future research directions that aim to drive further progress in this area. 
    more » « less
  5. ABSTRACT We identify the progenitor star of SN 2023ixf in Messier 101 using Keck/NIRC2 adaptive optics imaging and pre-explosion Hubble Space Telescope (HST)/Advanced Camera for Surveys (ACS) images. The supernova, localized with diffraction spikes and high-precision astrometry, unambiguously coincides with a progenitor candidate of $$m_\text{F814W}=24.87\pm 0.05$$ (AB). Given its reported infrared excess and semiregular variability, we fit a time-dependent spectral energy distribution (SED) model of a dusty red supergiant (RSG) to a combined data set of HST optical, ground-based near-infrared, and Spitzer Infrared Array Camera (IRAC) [3.6], [4.5] photometry. The progenitor resembles an RSG of $$T_\text{eff}=3488\pm 39$$ K and $$\log (L/\mathrm{L}_\odot)=5.15\pm 0.02$$, with a $$0.13\pm 0.01$$ dex ($$31.1\pm 1.7$$ per cent) luminosity variation at a period of $$P=1144.7\pm 4.8$$ d, obscured by a dusty envelope of $$\tau =2.92\pm 0.02$$ at $$1\, \mu \text{m}$$ in optical depth (or $$A_\text{V}=8.43\pm 0.11$$ mag). The signatures match a post-main-sequence star of $$18.2_{-0.6}^{+1.3}\, \mathrm{M}_\odot$$ in zero-age main-sequence mass, among the most massive SN II progenitor, with a pulsation-enhanced mass-loss rate of $$\dot{M}=(4.32\pm 0.26)\times 10^{-4} \, \mathrm{M}_\odot \, \text{yr}^{-1}$$. The dense and confined circumstellar material is ejected during the last episode of radial pulsation before the explosion. Notably, we find strong evidence for variations of $$\tau$$ or $$T_\text{eff}$$ along with luminosity, a necessary assumption to reproduce the wavelength-dependent variability, which implies periodic dust sublimation and condensation. Given the observed SED, partial dust obscuration remains possible, but any unobstructed binary companion over $$5.6\, \mathrm{ M}_\odot$$ can be ruled out. 
    more » « less
  6. Abstract The Bright Transient Survey (BTS) aims to obtain a classification spectrum for all bright (mpeak≤ 18.5 mag) extragalactic transients found in the Zwicky Transient Facility (ZTF) public survey. BTS critically relies on visual inspection (“scanning”) to select targets for spectroscopic follow-up, which, while effective, has required a significant time investment over the past ∼5 yr of ZTF operations. We presentBTSbot, a multimodal convolutional neural network, which provides a bright transient score to individual ZTF detections using their image data and 25 extracted features.BTSbotis able to eliminate the need for daily human scanning by automatically identifying and requesting spectroscopic follow-up observations of new bright transient candidates.BTSbotrecovers all bright transients in our test split and performs on par with scanners in terms of identification speed (on average, ∼1 hr quicker than scanners). We also find thatBTSbotis not significantly impacted by any data shift by comparing performance across a concealed test split and a sample of very recent BTS candidates.BTSbothas been integrated intoFritzandKowalski, ZTF’s first-party marshal and alert broker, and now sends automatic spectroscopic follow-up requests for the new transients it identifies. Between 2023 December and 2024 May,BTSbotselected 609 sources in real time, 96% of which were real extragalactic transients. WithBTSbotand other automation tools, the BTS workflow has produced the first fully automatic end-to-end discovery and classification of a transient, representing a significant reduction in the human time needed to scan. 
    more » « less